Fix links
Make a gif of pymeanshift samples with variables/stats
Visualize color values
Photography has been a hobby of mine for most of my life, and I found a particular niche in abstract photography, specifically multi-exposure images. This background inspired me to find mathematical ways to analyze my photo library as a whole, with a special focus on color trends and affinities.
The processes used in this project have a business application within a mobile app. By evaluating a user’s camera roll, the app could discern favorite colors and suggest products that match that color profile.
Note:
Shout out to my buddy Phil! He was a great resource for feedback and encouragement as I formulated my processing script, but also donated runtime on his computer and processed 250 images used in this dataset.
A photograph with 4000 pixels may have 4000 different color values represented. I wanted to “clump” pixels with similar colors in the same area of an image into a single color value. PyMeanShift accomplishes this by taking in the image and three numerical variables: spatial radius, range radius, and minimum density. These refer to maximum color difference, maximum placement difference, and minimum “clump” size, respectively.
FlickrID - unique identifier for each photo
DateTimeOriginal/CreateDate/ModifyDate - attempted to capture whether the images were edited on the phone (unsuccessful)
Software - iOS version or mobile app used for photo capture
LensInfo/ LensModel - data on which phone lens capture the photo
JFIFVersion - compression marker applied by some 3rd party apps. Disappears when image is edited in native iOS photos app.
ISO - light sensitivity setting
ExposureTime - in seconds (fractions)
FNumber - aperture
FocalLength- Fixed to LensInfo/LensModel
FocalLengthIn35mmFormat - iOS interpretation of zoom level
BrightnessValue - Auto-generated brightness value
SubjectArea - Coordinate values generated by iOS (not directly relevant to this project, but captured for future use)
A python class was used to gather image data as attributes, then dumped to a csv with vars().
All relevant attributes/variables described below
using_id - Flickr ID
img_width - in pixels
img_height - in pixels
do_img_at - timestamp for evaluating processing time
sub_img - 0 for whole image, 1 for top-left, 2 for middle-left, 5 for center, etc.
full_id - concat of flickrID and sub_image to form unique identifier.
RGB Overview Statistics
(r/g/b)_min - (3 columns) Minimum red/green/blue channel value in the whole image
(r/g/b)_max - (3 columns) Maximum red/green/blue channel value
(r/g/b)_mean - (3 columns) Average red/green/blue channel value
(r/g/b)_mode - (3 columns) count of common red/green/blue channel value (forgot to capture its value :facepalm: (in my attempt to capture the value, I neglected to reset the index of the pandas dataSeries))
center_rgb - (tuple) R/G/B value of the pixel mathematically in the center of the image
Next segment of columns captured from segmented/posterized image
post_num_regions - number of color “clumps” after processing
post_top_hsl - (tuple) most common pixel value
post_top_count - quantity of most common pixel value
post_(2-6)_hsl - (5 columns)(tuples) next most common pixel values, in descending order of frequency
post_(2-6)_count - (5 columns) counts for their respective common pixel values
center_hsl - (tuple) HSL value of the pixel mathematically in the center of the image
Hue color banding was done by subjective eyeball measurement
All hues: red, orange, yellow, green, cyan, blue, purple, magenta
full_(hue)_count - count of all pixels that fell within the hue band, regardless of saturation and lightness
visib_(hue)_count - count of pixels in the hue band deemed as “visibly [hue]” (saturation over 40%, lightness between 20% and 75%)
vivid_(hue)_count - count of pixels in the hue band deemed as “vividly [hue]” (saturation over 70%, lightness between 30% and 70%)
Saturation Statistics
sat_min_val - lowest saturation value in image
sat_25_val - 25% quartile value
sat_50_val - median saturation
sat_75_val - 75% quartile value
sat_max_val - most saturation
HSL Mean Values
hue_mean_val - average hue value (not incredibly meaningful on a looping spectrum)
sat_mean_val - average saturation value
light_mean_val - average brightness
Lightness Statistics
light_max_val - brightest value
light_max_count - quantity of pixels within 1.5% (literal) of the max lightness value
light_min_val - darkest value
light_min_count - quantity of pixels within 1.5% (literal) of he minimum lightness value (darkest)
light_25_value - 25% quartile value
light_50_value - median brightness
light_75_value - 75% quartile value
gen_bright_count - quantity of pixels with over 85% lightness
gen_dark_count - quantity of pixels with under 15% brightness
common_hsl_(1-4)_val - (4 columns)(tuple) four most common HSL values
common_hsl_(1-4)_count - (4 columns) quantities of the four most common HSL values
Due to collecting image processing data on multiple computers, multiple files were created for exif and image data – partly by design and party due to occasional read/write conflicts on shared files. All records were gathered into Excel and checked for duplicates before exporting as CSVs.
\(Ho:\) There is no correlation between time of year and color values
\(Ha:\) Warm color values are more prominent between May and September
\(Ho:\) There is no correlation between time of day and lightness values
\(Ha:\) Lightness values are higher between 6 am and 6pm
\(Ho:\) There is no correlation between saturation and being a picture of my cat
\(Ha:\) Low saturation values are increasingly common over time, especially in central sub-images
\(Ho:\) Vivid ratio (percentage of vivid pixels) is uniformly distributed among all Software types
\(Ha:\) Vivid ratio is consistently highest in Slow Shutter Cam photos without JFIF values
Pre Import:
Sub-image data for main images (0) was bugged in the first hours of
image processing. This was fixed in Excel during the data collation
stage.
EXIF:
# names(exif) <- gsub("([a-z0-9])([A-Z])", "\\1_\\2", names(exif))
# names(exif) <- names(exif) %>% tolower()
exif_tidy <- select(exif, -c(date_time_original, modify_date, lens_info, fnumber, focal_length))
exif_tidy <- replace_na(exif_tidy, list(subject_area = "0 0 0 0", jfifversion = 0))
Img_data:
imgsd_tidy <- select(img_data, -c(flickr, img_loc, the_image, img_width, img_height, crop_coords, do_img_at, r_mode, b_mode, g_mode))
imgsd_tidy <- replace_na(imgsd_tidy, list(
post_2_hsl = "(-1, -1, -1)",
post_3_hsl = "(-1, -1, -1)",
post_4_hsl = "(-1, -1, -1)",
post_5_hsl = "(-1, -1, -1)",
post_6_hsl = "(-1, -1, -1)"
)
)
EXIF
# exif_tidy <- exif_tidy %>% separate(create_date, into = c('date', 'time'), sep = " ", remove = TRUE) %>% separate(date, into = c('year', 'month', 'day'), sep = ":")
exif_tidy$date <- as.Date(paste("1881", exif_tidy$month, exif_tidy$day, sep = "-"), format ="%Y-%m-%d")
Img_data
Count by flickr id
subimg_qty <- imgsd_tidy %>% count(using_id)
To my (happy) surprise, only 5 images have less than 10 results and only 2 have less than 6. In the interest of time, I’m noting these IDs by hand and simply removing them from my working data
good_ids <- subimg_qty[subimg_qty$n >=6, "using_id"]
imgsd_tidy <- imgsd_tidy %>% filter(using_id %in% good_ids$using_id)
exif_tidy <- exif_tidy %>% filter(flickr_id %in% good_ids$using_id)
imgsd_tidy <- imgsd_tidy %>% mutate(total_pixels = full_red_count +
full_orange_count +
full_yellow_count +
full_green_count +
full_cyan_count +
full_blue_count +
full_purple_count +
full_mag_count)
imgsd_tidy$center_hsl <- str_replace(imgsd_tidy$center_hsl, '\\[|\\]', '')
imgsd_tidy <- imgsd_tidy %>% mutate_all(~ gsub('\\(|\\)', '', .))
imgsd_split <- imgsd_tidy %>%
separate(center_rgb,
into = c('center_r', 'center_g', 'center_b'),
sep = ',') %>%
separate(post_top_hsl,
into = c('post_top_hue', 'post_top_sat', 'post_top_light'),
sep = ',') %>%
separate(post_2_hsl,
into = c('post_2_hue', 'post_2_sat', 'post_2_light'),
sep = ',') %>%
separate(post_3_hsl,
into = c('post_3_hue', 'post_3_sat', 'post_3_light'),
sep = ',') %>%
separate(post_4_hsl,
into = c('post_4_hue', 'post_4_sat', 'post_4_light'),
sep = ',') %>%
separate(post_5_hsl,
into = c('post_5_hue', 'post_5_sat', 'post_5_light'),
sep = ',') %>%
separate(post_6_hsl,
into = c('post_6_hue', 'post_6_sat', 'post_6_light'),
sep = ',') %>%
separate(center_hsl,
into = c('center_hue', 'center_sat', 'center_light'),
sep = ',') %>%
separate(common_hsl_1_val,
into = c('common_hsl_1_hue', 'common_hsl_1_sat', 'common_hsl_1_light'),
sep = ',') %>%
separate(common_hsl_2_val,
into = c('common_hsl_2_hue', 'common_hsl_2_sat', 'common_hsl_2_light'),
sep = ',') %>%
separate(common_hsl_3_val,
into = c('common_hsl_3_hue', 'common_hsl_3_sat', 'common_hsl_3_light'),
sep = ',') %>%
separate(common_hsl_4_val,
into = c('common_hsl_4_hue', 'common_hsl_4_sat', 'common_hsl_4_light'),
sep = ',')
I probably should have saved those independently during the python image processing stage.
To be added as needed:
https://plotly.com/r/3d-scatter-plots/
fig <- plot_ly(imgsd_split, x=~center_r, y= ~center_g, z= ~center_b,
type = "scatter3d", mode="markers", size = 1, color = ~sub_img)
fig
fig <- plot_ly(imgsd_split, x=~r_mean, y= ~g_mean, z= ~b_mean,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split, x=~hue_mean_val, y= ~sat_mean_val, z= ~light_mean_val,
type = "scatter3d", mode="markers", size = 1, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split, x=~post_top_hue, y= ~post_top_sat, z= ~post_top_light,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split,
x=~common_hsl_1_hue,
y= ~common_hsl_1_sat,
z= ~common_hsl_1_light,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split,
x=~common_hsl_2_hue,
y= ~common_hsl_2_sat,
z= ~common_hsl_2_light,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split,
x=~common_hsl_3_hue,
y= ~common_hsl_3_sat,
z= ~common_hsl_3_light,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: Ignoring 2 observationsWarning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: Ignoring 2 observationsWarning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split,
x=~common_hsl_4_hue,
y= ~common_hsl_4_sat,
z= ~common_hsl_4_light,
type = "scatter3d", mode="markers", size = 2, color = ~sub_img)
fig
Warning: Ignoring 2 observationsWarning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: Ignoring 2 observationsWarning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
fig <- plot_ly(imgsd_split,
x=~common_hsl_4_hue,
y= ~common_hsl_4_sat,
type = "scatter", mode="markers", size = 2, color = ~sub_img)
fig <- plot_ly(imgsd_split,
x=~light_min_val,
y= ~light_max_val,
type = "scatter", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
imgsd_split$gen_bright_count <- as.integer(imgsd_split$gen_bright_count)
imgsd_split$gen_dark_count <- as.integer(imgsd_split$gen_dark_count)
fig <- plot_ly(imgsd_split,
x=~gen_bright_count,
y= ~gen_dark_count,
type = "scatter", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
# imgsd_split$gen_bright_count <- as.integer(imgsd_split$gen_bright_count)
# imgsd_split$gen_dark_count <- as.integer(imgsd_split$gen_dark_count)
fig <- plot_ly(imgsd_split,
x=~sat_min_val,
y= ~sat_max_val,
type = "scatter", mode="markers", size = 2, color = ~sub_img)
fig
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
-trim exif down to id/date-time/software/brightness info
Left join to add date/software data from exif to img_data
*do facets with sub_img numbers*
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.